home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SGI Freeware 2002 November
/
SGI Freeware 2002 November - Disc 2.iso
/
dist
/
fw_glimpse.idb
/
usr
/
freeware
/
src
/
glimpse-3.0
/
index
/
index.chronicle.z
/
index.chronicle
Wrap
Text File
|
1997-09-09
|
4KB
|
68 lines
/* Copyright (c) 1994 Sun Wu, Udi Manber, Burra Gopal. All Rights Reserved. */
Started in Aug 1993.
0. bgopal: The new indexing mechanism is totally different from the original one
(written by udi and sun wu) -- the only thing common between the two is the
format of the index and the partitioning algorithm (v. simple algo).
1. Changed pirs.c/main()/line16 to make argc>1 check before accessing argv.
2. Added a leading bit in the index values to distinguish them from the next
word. This was mentioned but never implemented (comment in build_in.c).
3. Removed simple binary file and uuencoded file testing from filetype.c and
put it into a new file simpletest.c so that the compress module can use
it too.
4. Removed tolower in getword() so that I can index Linear, LINEAR and linear
depending on the relative frequency. Else, compression becomes a problem.
5. Added case-check (allupper, alllower, onlyfirstupper-restlower) routine
to getword() in getword.c -- does this only in case '-c' was specified.
6. Modified insert_h() and insert_index() procedures in build_in.c to store
the count of words rather than the partition numbers if CountWords == ON.
7. Modified pirs.c to take the option -c for CountWords instead of gathering
partition information (i.e., when we don't want .index_list for searching
but for the EXACT frequency of occurrence of different words).
8. Modified merge_in.c to merge counts of similar words occurring in two
different files rather than the partition numbers: the output of build_in
when the CountWords option is set is: a word followed by end-of-word-mark
followed by a list of (fprintf, not fwrite) counts separated by blanks,
ended with a newline.
9. Changed the files "everywhere" to account for malloc-failures (try again
after purging the hash-table once: if fail again, THEN exit).
A. Changed the algorithm for build_hash -- it did not index all files.
Block-copied the code in the inner while loop after the loop-terminates.
B. Removed leading bit! Now sort gave problems on partiton#0, so ignored
partition#0 altogether like partition#'\n' was ignored to figure out the
end of the current input line/word.
C. Removed all references to pirs everywhere: it is now "glimpse" -- 1/28/94.
D. Bug fixes relating to $HOME not being there in the environment.
E. Bug fix related to "very small directories" (partitioning algorithm).
F. Fixed BIG bug related to memory leaks which can cause aborts... not sure if
this was the reason for deadlocks (schwartz's bug) but ran ok for 280MB.
G. Fixed a bug related to very small indices (with one partition only).
H. Added a facility to have one file per block, i.e., each file is in one
partition all by itself: a MAJOR change was done to many data-structures and
encode/decode functions were added so that sort/gets don't get confused.
-- bg, 23-30 Mar 1994
I. In fast index, the old index may be destroyed and built again. In add to
index, it is never destroyed: things to it are only added. In add to index,
the old guys are NOT checked for modification, etc, and all the new ones are
added. Whereas in FastIndex, even the new ones are checked for modification
date. In both, non-existent files are removed but the holes are not filled.
The fastest way to add a new set of files is to use -f. This is same as
saying -f AND -a except that the old index is never rebuilt with -a. (The
index MIGHT need rebuilding if it was not found or partitions overflowed.)
(Does this make sense? :-)
-- bg, 20-22 Apr 1994
J. Changed STAT, MESSAGE, LOG (filenames) to STATFILE, MESSAGEFILE, LOGFILE
to avoid name clashes with some C-lib variables.
-- bg, 29 Apr 1994
K. Changed dir.c and partition.c to take care of absolute path names on the
command line itself: now, everything on the command line is forced to be
indexed (esp. symlinks which were excluded by default earlier).
-- bg, 2 May 1994
L. Increased maximum number of files that can be indexed to 254*254 = 64516.
-- bg, 4 May 1994
M. Added ability to index structured files during June/July 1994.
N. Added ability to index compressed files, and automatically create compress
dictionaries (for cast) with -z option during Aug 1994.
O. Added user option -i to make include have higher priority than exclude
during Aug 1994.
P. Completed incremental indexing support during June 1995